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Abstract. Let P = (Pi , Pa , . . . , P n ), Pi £ R for all i, be a signal and let C be a 
constant. In this work our goal is to find a function F : [n] — > R which optimizes 
the following objective function: 

n 

nun^(Pj - F,) 2 + C x |{* : F, / F +1 }\ (1) 

i=l 

The above optimization problem reduces to solving the following recurrence, 
which can be done efficiently using dynamic programming in 0(n 2 ) time: 

OPTi = min 

0<j<i-l 

The above recurrence arises naturally in applications where we wish to approx- 
imate the original signal P with another signal F which consists ideally of few 
piecewise constant segments. Such applications include database (e.g., histogram 
construction), speech recognition, biology (e.g., denoising aCGH data) applica- 
tions and many more. 

In this work we present two new techniques for optimizing dynamic program- 
ming that can handle cost functions not treated by other standard methods. The 
basis of our first algorithm is the definition of a constant-shifted variant of the ob- 
jective function that can be efficiently approximated using state of the art methods 
for range searching. Our technique approximates the optimal value of our objec- 
tive function within additive e error and runs in 0(n 15 log (^)) time, where 
U = maxi fi. The second algorithm we provide solves a similar recurrence 
within a factor of e and runs in 0(n log 2 n/e). The new technique introduced 
by our algorithm is the decomposition of the initial problem into a small (log- 
arithmic) number of Monge optimization subproblems which we can speed up 
using existing techniques. 

1 Introduction 

Dynamic programming is a widely used problem solving technique with applications 
in various fields such as operations research, biology, speech recognition, time series 
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analysis and many others. Due to its importance a lot of work has focused on speeding 
up naive dynamic programming implementations. Such techniques include the Knuth- 
Yao technique[21,28,29], a special case of the use of totally monotone matrices [5], 
the SMAWK algorithm for finding the row-minima of totally monotone matrices [ ■], 
and several other techniques exploiting special properties such as the convexity and 
concavity [12,15]. The basis of these techniques lies in the theory of Monge properties 
for optimization [8], which date back to the French mathematician Monge [23]. 

In this work, we consider the following recurrence, where P G R™ and C is a 
constant: 



OPT = 0, OPTi = min 

0<j<i-l 

This recurrence arises naturally in several applications where one wants to approx- 
imate a given signal / with a signal F which ideally consists of few piecewise con- 
stant segments. Such applications include histogram construction in databases (e.g., 
[20,17,18]), determining DNA copy numbers in cancer cells from micro-array data 
(e.g., [24,26]), speech recognition, data mining (e.g., [25]) and many others. 

In this work we present two new techniques for optimizing dynamic programming 
that can handle cost functions not treated by other standard methods. The basis of our 
first algorithm is the definition of a constant-shifted variant of the objective function 
that can be efficiently approximated using state of the art methods for range searching. 
Our technique approximates the optimal value of our objective function within additive 
e error and runs in 0(n 15 log (— )) time, where U = max^ /j. The second algorithm we 
provide solves a similar recurrence within a factor of e and runs in 0(n log 2 n/e). The 
new technique introduced by our algorithm is the decomposition of the initial problem 
into a small (logarithmic) number of Monge optimization subproblems which we can 
speed up using existing techniques. 

The remainder of the paper is organized as follows: Section 2 presents briefly exist- 
ing work on the problem and the necessary background. Section 3 presents our proposed 
algorithms and their theoretical analysis and Section 4 concludes the paper with a brief 
summary and discussion. 

2 Background 

In this section, we briefly summarize existing work on speeding up dynamic program- 
ming. First, we briefly present existing work on speeding up dynamic programming. 
Then, we present state of the art results on reporting points in halfspaces, an important 
case of range queries in computational geometry, and on optimizing Monge functions. 

2.1 Speeding up Dynamic Programming 

Dynamic programming is a powerful problem solving technique introduced by Bellman 
[6] with numerous applications in biology (e.g., [24,19,27]), in control theory (e.g., [7]), 



OPT, 




Em=j+1 P" 



+C, for i > 



in operations research and many other fields. Due to its importance, a lot of research has 
focused on speeding up naive dynamic programming implementations. A successful ex- 
ample of speeding up a naive dynamic programming implementation is the computation 
of optimal binary search trees. Gilbert and Moore solved the problem efficiently using 
dynamic programming [16]. Their algorithm runs in 0(n 3 ) time and for several years 
this running time was considered to be tight. In 1971 Knuth [21] showed that the same 
computation can be carried out in 0(n 2 ) time. This remarkable result was generalized 
by Frances Yao in [28,29]. Specifically, Yao showed that this dynamic programming 
speedup technique works for a large class of recurrences. She considered the recur- 
rence c(i, i) = 0, c(i, j) = mirij<fc<j (c(i, k — 1) + c(k,jj) + w(i,j) for i < j where 
the weight function w satisfies the quadrangle inequality (see Section 2.3) and proved 
that the solution of this recurrence can be found in 0(n 2 ) time. Eppstein, Galil and 
Giancarlo have considered similar recurrences where they showed that naive 0(n 2 ) 
implementations of dynamic programming can run in 0(n\ogn) time [12]. Further- 
more, Eppstein, Galil, Giancarlo and Italiano have also explored the effect of sparsity 
[13,14], another key concept in speeding up dynamic programming. Aggarwal, Klawe, 
Moran, Shor, Wilber developed an algorithm, widely known as the SMAWK algorithm, 
[ ] which can compute in 0(n) time the row maxima of a totally monotone n x n ma- 
trix. The connection between the Knuth- Yao technique and the SMAWK algorithm was 
made clear in [5], by showing that the Knuth- Yao technique is a special case of the use 
of totally monotone matrices. The basic properties which allow these speedups are the 
convexity or concavity of the weight function. The study of such properties data back 
to Monge [23] and are well studied in the literature, see for example [8]. 

Close to our work, lies the work on histogram construction, an important problem 
for database applications. Jagadish et al. [20] originally provided a simple dynamic 
programming algorithm which runs in 0(kn 2 ) time, where k is the number of buckets 
and n the input size and outputs the best V-optimal histogram. Guha, Koudas and Kim 
[17,18] propose a (1 + e) approximation algorithm which runs in linear time. Their al- 
gorithms exploits monotonicity properties of the key quantities involved in the problem. 

2.2 Reporting Points in a Halfspace 

Let S be a set of points in d-dimensional space M. d . Consider the problem of prepro- 
cessing S such that for any halfspace query 7 we can report the set S n 7 efficiently. 
This problem is a special case of range searching. For an extensive survey see [1]. For 
d = 2, the problem has been solved optimally by [9]. For d = 3, [10] gave a solu- 
tion with nearly linear space and 0(logn + k) query time, while [2] gave a solution 
with a more expensive preprocessing but 0(n\ogn) space. For dimensions d > 4, 
[11] gave an algorithm that requires with 0(nL d / 2 J+ e ) preprocessing time and space, 
where e is an arbitrarily small positive constant, and can subsequently answer queries in 
0(log n + k) time. Matousek in [22] provides improved results on the problem, which 
are conjectured to be optimal up to 0(n e ) or polylogarithmic factors. Now, we refer to 
the main theorem of [22] on the emptiness decision problem, i.e., determining whether 
S n 7 empty or not, phrased as theorem 1.3: 

Theorem 1. [22] Given a set of S of n points in WL d , d > 4, one can build in time 
0(n 1+s ) a linear size data structure which can decide whether a query halfspace con- 



tains a point of S in time 0(n where 6 > is an arbitrarily small 

constant and c' — c! (d) is a constant depending on the dimension. 

2.3 Monge Functions and Dynamic Programming 

Here, we refer to a theorem in [12] which we use in Section 3.2. A function w defined 
on pairs of integer indices is Monge if for any 4-tuple of indices i± < i-x < 13 < i^, 
w(ix,ii) + w{i 2 ,i^) > w(i\,iz) + w(i2,ii). Then the following Theorem holds: 

Theorem 2 ([12]). 

Given a Monge function w : {0 . . . n}x{0 . . . n} — > K, and a vector (ao, a\ . . . a n _i) 
the value ofmhij < i{aj + w(j, i)} can be calculated for each i = 1, . . . , n given the 
previous values ao, ai, . . . a^-i in O(nlogn) time total. 

3 Proposed Method 

In the following, let the initial vector be (Pi, . . . ,P n ) and let Si = Pj- The 

transition function for the dynamic programming for i > can be rewritten as: 

OPT, = min J<4 OPT, + min x YL=j+i( P ™ - x ) 2 + C 

= win^ OPT, + £ l m= . +1 Pi ^f- + C (2) 

The transition can be viewed as a weight function w(j, i) that takes the two indices 

j and i as parameters such that u>(j, i) — Y^ m =j+i P m ~ i-/ — I" C- The dynamic 
programming is then equivalent to a shortest path from to n. 

The weight function does not have the Monge property, as demonstrated by the 
vector P = 1, P 2 = 2, P 3 = 0, P 4 = 2, . . . , P 2k -i = 0, P 2k = 2, P 2k +i = 1. When 
C = 1, thee optimal choices of j for i = 1, . . . , 2k are j = i — 1, i.e., we fit one 
segment per point. However, once we add in P 2 k+i the optimal solution changes to to 
fitting all points on a single segment. Therefore, choosing a transition to ji over one to 
j 2 at some i does not allow us to discard j 2 from future considerations. This is one of 
the main difficulties for applying techniques based on the increasing order of decision 
points, such as the method of Eppstein etal. [ 1 2] , to reduce the complexity of the O (n 2 ) 
algorithm in [26]. 

Let DP t = OPTi - Y?m=i P m- We claim that DP* is the solution to a simpler 
optimization problem: 

Lemma 1. DPi satisfies the following optimization formula: 

Dp = [ min, <4 DPj - + C if i>0 

1 [ ifi = 



Proof. As J2m=i Pm = 0' trie resu lt is true for i = 0. If i > 0, substituting in equa- 
tion 2 gives: 

DPi = OPT, - Pi 

m— 1 

= rmnOPT j - {Si - Sj)2 + £ i£ - £ P* + C 

m— j + 1 rn—1 

= rninDP j + Y J Pl-^ ^+ £ P 2 - £ P* + C 

3<i * — ' z — 1 * — ' * — ' 

m— 1 m— j + 1 rn—1 

= minPP - ( ^'~^ )2 +C 

j<» 2 - j 

Elimination of the terms 53L=i f rom both sides gives our result. Observe that 
the second order moments of OPTi are absent from DPi. 

We can use i) to denote the shifted weight function, aka. w(j, i) = — \_f + 
C, it's easy to check that k; (j, i) = w(j, i) + Ylm=i 

3.1 0(n 1,5 log ( — )) algorithm to approximate within additive e 



Algorithm 1 Approximation within additive e using 4D halfspace queries 
initialize 4D halfspace query structure Q 
for i = 1 to 7i do 

low «— 
high n(7 2 

while high — low > e/ndo 
m (/ow + /ii<?/i) /2 

if Q.intersect((m, i, Si, —l) T x < m ■ i + Sf) then 

high <— m 
else 

iotu m 
end if 
end while 

DPi <- (low + high) )2 
x <- (i, DPi, 25,, S? + DP,i) 
Q.insert(x) 
end for 



Our proposed method (as shown in Algorithm 1) uses the results of [22] to obtain 
a fast algorithm for the additive approximation variant of the problem. Specifically, 
the algorithm initializes a 4-dimensional halfspace query data structure. The algorithm 



then uses binary searches to compute an accurate estimate of the value DPi for i = 
1, . . . , n. As errors are introduced at each term, we use DPi to denote the approximate 
value of DPi calculated by earlier iterations of the binary search, and DPi to be the 
optimum value of the transition function computed using the approximate values of 
DPj. Formally: 

DPi = min i<i DPj - ^ ^- +C. 

* — ' 

w{j.i) 

Theorem 3 shows that it's sufficient to approximate DP t to within an additive e/n of 
DPi in order to approximate DP n within additive e. Let U = max {VC, P±, . . . , P n }, 
the maximum value of the objective function is upper-bounded by the cost one would 
incur from declaring there is single interval with x = 0, giving a bound of U 2 n. There- 
fore 0(log(^^)) = 0(log (y)) iterations of binary search at each i are sufficient. 

To check whether DPi > x + C, we need to solve the decision problem of whether 
there exists a j < i such that the following inequality holds: 



x + C > DPi 

3j <i,x> DP, - {Si ' Sj)2 

3j < i, x(i - j) > D~Pj(i - j) - (Si - S 3 f 

3j <i,xi + > xj + DPji + 2SiSj sj DP 3 j 

The term on the right hand side can be interpreted as the dot product between 
(x, i, S h -1) and (j, D~P jt 2Sj, S'j+jDPj). If we think of the values (j, D~P j ,2S j ,S^+ 
DPjj) as points in M 4 , the decision problem becomes whether the intersection of a 
point set with a halfplane is null. If the point set has size n, this can be done in O(n 5 ) 
per query and O(logn) amortized time per insertion of a point[22]. So the optimal 
value of DPi can be found within an additive constant of e/n using the binary search 
in O(n 5 log (— )) time. Therefore, we can proceed along the indices from 1 to n, find 
the approximately optimal value of OPTi and insert a point corresponding to it into the 
query structure, getting an algorithm that runs in 0(n l b log (— )) time. 

The following theorem states that a small error at each step suffices to give an over- 
all good approximation. We show inductively that if DPi approximates DPi within 
e/n, DPi is within ie/n additive error from the optimal value DPi. For the proof of 
Theorem 3, see the Appendix 4. 

Theorem 3. Let DPi be the approximation of our algorithm to DPi. Then, the follow- 
ing inequality holds: 

~ ez 

\DPi-DPi\<- (4) 
n 

By substituting in Theorem 3 i = n we obtain the following Corollary, proving that 
our algorithm is an approximation algorithm within e additive error. 



Corollary 1. Let DP n be the approximation of our algorithm to DP n . Then, the fol- 
lowing inequality holds: 

\DP n -DP n \<e (5) 

3.2 0(n log 2 n/e) algorithm to approximate within multiplicative e 

Once again, consider the transition function w in the optimization formula for OPTi. 
Our approach is based on approximating w with a logarithmic number of Monge func- 
tions, while incurring a multiplicative error of at most e. When viewed from the context 
of a shortest path problem, we are perturbing the weight of each edge by e. So as long 
as the weight of each edge is positive, the length of any path, and therefore the optimal 
answer computed in the perturbed version, is off by a factor of e as well. 

The main idea of our algorithm is as follows: we break our initial problem into a 
small (logarithmic) number of Monge optimization subproblems which we can speed 
up using existing techniques, e.g., [12]. We achieve this by detecting which part of 
w(j, i) causes w(j, i) not to be Monge, and then by finding intervals in which we can 
approximate it accurately with a constant. We also make sure the optimal breakpoints 
of the Monge sub-problems lie in the specified subintervals by making the function 
outside that subinterval arbitrarily large while maintaining its Monge property. 



Algorithm 2 Approximation within factor of e using Monge function search 

Maintain m — log nj log (1 + e) Monge function search data structures Qi, . . . , Q m {Each 
Q k corresponds to a Monge function w k (j, i) such that w k (j, i) = (X))n=j+i(* ~~ j) P m ~ 
(Si - Sj) 2 )/{1 + e) k ) + C if (1 + e) fe_1 < i -j < (1 + e) fe , otherwise arbitrarily large .} 
OPT <- 
for k = 1 to m do 

Qfe.oo <- 
end for 

for i = 1 to n do 

OPTi <- oo 

for k = 1 to m do 

localmhifc <- min 3 <i Qk-aj + w k {j, i). 

OPT, «- mm{OPTi, localmin + C} 
end for 

for k = 1 to m do 
Q k .ai «- OPT, 
end for 
end for 



We let w'(j,i) = - l) P l - - S 3 ) 2 - In other words, w(j,i) = 

w'(j, - j) + C. Since Var(X) = E(X 2 ) - E(X] 2 , an alternate formulation of 

w'(j,i) is: 

w'{j,i)= {P mi -Pm 2 ) 2 



Lemma 2. w'(j,i) is Monge, in other words, for any £1 < £2 < £3 < H> w'(ii, £4) + 
w'{i 2 , 13) > w'(h, £3) + w'(i 2 , £4)- 

Proof. Since each term in the summation is non-negative, it suffices to show that any 
pair of (mj, m 2 ) is summed as many times on the left hand side as on the right hand 
side. If £2 + 1 < mi < m-2 < £3, each term is counted twice on each side. Otherwise, 
each term is counted once on the left hand side since £1 + 1 < mi < m-i < £4 and at 
most once on the right hand side since [£1 + 1, £3] n [£2 + 1, £4] = [£2 + 1, £3]- 

Also, as w'(i,j), £ — j and C are all non-negative, approximating w'(i,j)/(i — j) 
within a factor of e gives an approximation of w(i, j) within a factor of e. For each £, we 
try to approximate i—j (j < £) with a constant c' such that c' < i—j < c'(l + e). Since 
1 < £ — j < n, we only need log( 1+e ) n = log n/ log(l + e) distinct values of c' for all 
transitions. Note that this is equivalent to j being in the range [I, r] = [£— c'(l+e),i— c']. 

Since d is a constant, w'(j,i)/c' is also a Monge function. However, notice that 
we can only use j when c' < i — j < c'(l + e). This constraint on pairs can 
be enforced by setting Wk(j, £) to arbitrarily large values when (j, £) do not satisfy the 
condition. This ensures that j will not be used as a breakpoint for i. Furthermore, Wk 
needs to be adjusted to remain Monge in order to keep the assumptions of Theorem 2 
valid. Since the + £4] is the longest and [£2 + 1, £3] is the shortest range respectively, 
one possibility is to assign exponentially large values to very long and short ranges. The 
following is one possibility when M is an arbitrarily large positive constant: 



Lemma 3. Wk is Monge. That is, for any 4-tuple £1 < £2 < £3 < £4, Wk(ii, £4) + 
w k {i2,h) > w k (ix,i 3 ) + w fc (£ 2 ,£ 4 )- 

Proof. As M is arbitrarily large, we may assume Wk(j,i) > w'(j,i). If c' < £3 — 
£i,£4-£2 < (l + e)c', thenu; fe (£i,£ 3 ) + u; fc (£2,£4) = (l/c')(w'(£i, £3) + w'(i 2 , £4)) < 
(l/c')(V (£i,£4) + w'(i 2 ,i 3 )) < Wk{h,h) + w fe (£ 2 ,£ 3 ) 

We assume £3 — £1 < £4 — £2 since the mirror case can be considered similarly. 
Suppose £3 — £1 < c' and £4 — £2 < c'(l + e). Then as £ 2 > £1, £3 — £2 < £3 — £1 — 1. 
w k {i 2 ,iz) = 2 n -^ +l2 M > 2-2 n ~ i3 ~ il > w k {h, £3) + Wkfa, h) The case of c' < 
£3 — £1 and c'(l + e) < £4 — £2 can be done similarly. 

Suppose £3 — £1 < c' and c'(l + e) < £4 — £2. Then £3 — £2 < £3 — £1 and 
£4 - £1 > £4 - £2 gives Wk{h,u) > Wk{i2,iA) and Wk{i2-,h) > Wk{ii,h). Adding 
them gives the desired result. 

So the equation for OPTi becomes: 




(6) 



OPT, = min OPT, + E^lH + C 



min OPTj + w k (j,i) + C 



Note that storing values of the form 2 M using only their exponent k suffices for 
comparison, so this adjustment does result in any change in runtime. By Theorem 2, 
the Monge function optimization problem for each wu can be solved in 0(n\ogn) 
time, giving a total runtime of 0(n log 2 n/e). Pseudocode for this algorithm is shown 
in Algorithm 2. The algorithm uses m = log n/e copies of the algorithm mentioned in 
theorem 2 implicitly. 

4 Conclusions 

In this work we present two new techniques for optimizing dynamic programming that 
can handle cost functions not treated by other standard methods. The first algorithm 
approximates the optimal value of our objective function within additive e error and runs 
in 0(n 15 log (— )) time, where U = max^ /j. The efficiency of our algorithm is based 
on the work of Jin Matousek [22], since our binary search on the value of a costant- 
shifted variant of the objected reduces to performing halfspace queries. The second 
algorithm solves a similar recurrence within a factor of e and runs in O (n log 2 n/e) . The 
new technique introduced by our algorithm is the decomposition of the initial problem 
into a small (logarithmic) number of Monge optimization subproblems which we can 
speed up using existing techniques [12]. 

While the recurrences we solve are not treated using existing techniques for speed- 
ing up dynamic programming in an exact way, the results we obtain suggest that the 
0(n 2 ) bound is not tight, i.e., there exists more structure which we can take advantage 
of. For example, the following lemma holds(see Appendix for a proof): 

Lemma 4. If \Pi 1 — Pi 2 \ > 2%/2C ? , then in the optimal solution of the dynamic pro- 
gramming using Li norm, i\ and i% are in different segments. 

In future work, we plan to exploit the structure inherent to our problem to obtain a 
faster, exact algorithm. 
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Appendix 

The proof of Theorem 3 follows: 

Proof. We use induction on the number of points. Using the same notation as above, let 
DPi = mirij<i DPj — w(j, i) + C. By construction the following inequality holds: 

\DPi-DPi\<-Vi = l,...,n (7) 

n 

When i = 1 it is clear that \DP\ — DP\\ < —. Our inductive hypothesis is the 
following: 

\DPj -DP,\< J -^yj <t (8) 
It suffices to show that the following inequality holds: 

\DPi - DPi\ < { — ^ (9) 
n 

since then by the triangular inequality we obtain: 

^ > \DPi - DPi\ + \DPi - DPi\ > \DPi - DPi\. 
Let j*,j be the optimum breakpoints for DPi and DPi respectively, < i — 1. 

DPi = DP 3 , + w{j*,i) + C < 

< DPj+w(j,i) + C < 

< DP-, + w(], i) + C+ J -(by 8) = 

J n 

= DP, + — < 
n 

<DP l + ^l 
n 

Similarly we obtain: 

DPi = D~P 3 + w(j, i) + C < 

< DP 3 , + w(j*,i) + C< 

< DP* + w(j*,i) + C+ J —(by 8) = 

n 

= DP, + 3 — < 
n 

<DP l + ^l 
n 

Combining the above two inequalities, we obtain 9. QED 



The proof of Lemma 4 follows: 

Proof. The proof is by contradiction. Suppose the optimal solution has a segment [i, j] 
where i < i\ < 12 < j, and its optimal x value is x* . Then consider splitting it into 5 
intervals [i, i\ — 1], [ii, ii], [ii + — 1], [12,12], [«2 + 1, ?i- If we let x — x* in the 
intervals not containing ii and 12, their values are same as before. Also, as \P^ — Pi 2 \ > 

2V2C, (P h - x) 2 + (P i2 - xf > 2V2C 2 = 4C. So by letting z = P h in [ii,^] 
and x = Pi 2 in [12,^2], the total decreases by more than AC. This is more than the 
added penalty of having 4 more segments, a contradiction with the optimality of the 
segmentation. QED 



